{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Quick Get Started Notebook of Intel® Neural Compressor for Pytorch\n",
    "\n",
    "\n",
    "This notebook is designed to provide an easy-to-follow guide for getting started with the [Intel® Neural Compressor](https://github.com/intel/neural-compressor) (INC) library for [pytorch](https://github.com/pytorch/pytorch) framework.\n",
    "\n",
    "In the following sections, we are going to use a DistilBert model fine-tuned on MRPC as an example to show how to apply post-training quantization on [transformers](https://github.com/huggingface/transformers) models using the INC library.\n",
    "\n",
    "\n",
    "The main objectives of this notebook are:\n",
    "\n",
    "1. Prerequisite: Prepare necessary environment, model and dataset.\n",
    "2. Quantization with INC: Walk through the step-by-step process of applying post-training quantization.\n",
    "3. Benchmark with INC: Evaluate the performance of the FP32 and INT8 models.\n",
    "\n",
    "\n",
    "## 1. Prerequisite\n",
    "\n",
    "### 1.1 Environment\n",
    "\n",
    "If you have Jupyter Notebook, you may directly run this notebook. We will use pip to install or upgrade [neural-compressor](https://github.com/intel/neural-compressor), [pytorch](https://github.com/pytorch/pytorch) and other required packages.\n",
    "\n",
    "Otherwise, you can setup a new environment. First, we install [Anaconda](https://www.anaconda.com/distribution/). Then open an Anaconda prompt window and run the following commands:\n",
    "\n",
    "```shell\n",
    "conda create -n inc_notebook python==3.8\n",
    "conda activate inc_notebook\n",
    "pip install jupyter\n",
    "jupyter notebook\n",
    "```\n",
    "The last command will launch Jupyter Notebook and we can open this notebook in browser to continue.\n",
    "\n",
    "Then, let's install necessary packages."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# install neural-compressor from source\n",
    "import sys\n",
    "!git clone https://github.com/intel/neural-compressor.git\n",
    "%cd ./neural-compressor\n",
    "!{sys.executable} -m pip install -r requirements.txt\n",
    "!{sys.executable} setup.py install\n",
    "%cd ..\n",
    "\n",
    "# or install stable basic version from pypi\n",
    "!{sys.executable} -m pip install neural-compressor\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# install other packages used in this notebook.\n",
    "!{sys.executable} -m pip install -r requirements.txt\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.2 Load Dataset\n",
    "\n",
    "The General Language Understanding Evaluation (GLUE) benchmark is a group of nine classification tasks on sentences or pairs of sentences which are:\n",
    "\n",
    "- [CoLA](https://nyu-mll.github.io/CoLA/) (Corpus of Linguistic Acceptability) Determine if a sentence is grammatically correct or not.\n",
    "- [MNLI](https://arxiv.org/abs/1704.05426) (Multi-Genre Natural Language Inference) Determine if a sentence entails, contradicts or is unrelated to a given hypothesis. This dataset has two versions, one with the validation and test set coming from the same distribution, another called mismatched where the validation and test use out-of-domain data.\n",
    "- [MRPC](https://www.microsoft.com/en-us/download/details.aspx?id=52398) (Microsoft Research Paraphrase Corpus) Determine if two sentences are paraphrases from one another or not.\n",
    "- [QNLI](https://rajpurkar.github.io/SQuAD-explorer/) (Question-answering Natural Language Inference) Determine if the answer to a question is in the second sentence or not. This dataset is built from the SQuAD dataset.\n",
    "- [QQP](https://data.quora.com/First-Quora-Dataset-Release-Question-Pairs) (Quora Question Pairs2) Determine if two questions are semantically equivalent or not.\n",
    "- [RTE](https://aclweb.org/aclwiki/Recognizing_Textual_Entailment) (Recognizing Textual Entailment) Determine if a sentence entails a given hypothesis or not.\n",
    "- [SST-2](https://nlp.stanford.edu/sentiment/index.html) (Stanford Sentiment Treebank) Determine if the sentence has a positive or negative sentiment.\n",
    "- [STS-B](http://ixa2.si.ehu.es/stswiki/index.php/STSbenchmark) (Semantic Textual Similarity Benchmark) Determine the similarity of two sentences with a score from 1 to 5.\n",
    "- [WNLI](https://cs.nyu.edu/faculty/davise/papers/WinogradSchemas/WS.html) (Winograd Natural Language Inference) Determine if a sentence with an anonymous pronoun and a sentence with this pronoun replaced are entailed or not. This dataset is built from the Winograd Schema Challenge dataset.\n",
    "\n",
    "Here, we use MRPC task. We download and load the required dataset from hub."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import datasets\n",
    "import numpy as np\n",
    "import transformers\n",
    "from datasets import load_dataset, load_metric\n",
    "from transformers import (\n",
    "    AutoConfig,\n",
    "    AutoModelForSequenceClassification,\n",
    "    AutoTokenizer,\n",
    "    EvalPrediction,\n",
    "    Trainer,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "task_name = 'mrpc'\n",
    "raw_datasets = load_dataset(\"glue\", task_name)\n",
    "label_list = raw_datasets[\"train\"].features[\"label\"].names\n",
    "num_labels = len(label_list)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.3 Prepare Model\n",
    "Download the pretrained model [textattack/distilbert-base-uncased-MRPC](https://huggingface.co/textattack/distilbert-base-uncased-MRPC) to a pytorch model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model_name = 'textattack/distilbert-base-uncased-MRPC'\n",
    "\n",
    "config = AutoConfig.from_pretrained(\n",
    "    model_name,\n",
    "    num_labels=num_labels,\n",
    "    finetuning_task=task_name,\n",
    "    use_auth_token=None,\n",
    ")\n",
    "\n",
    "tokenizer = AutoTokenizer.from_pretrained(\n",
    "    model_name,\n",
    "    use_auth_token=None,\n",
    ")\n",
    "\n",
    "model = AutoModelForSequenceClassification.from_pretrained(\n",
    "    model_name,\n",
    "    from_tf=False,\n",
    "    config=config,\n",
    "    use_auth_token=None,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.4 Dataset Preprocessing\n",
    "We need to preprocess the raw dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sentence1_key, sentence2_key = (\"sentence1\", \"sentence2\")\n",
    "padding = \"max_length\"\n",
    "label_to_id = None\n",
    "max_seq_length = 128\n",
    "\n",
    "def preprocess_function(examples):\n",
    "    args = (\n",
    "        (examples[sentence1_key], examples[sentence2_key])\n",
    "    )\n",
    "    result = tokenizer(*args, padding=padding, max_length=max_seq_length, truncation=True)\n",
    "    return result\n",
    "\n",
    "raw_datasets = raw_datasets.map(preprocess_function, batched=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Quantization with Intel® Neural Compressor"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.1 Define metric, evaluate function, and dataloader\n",
    "\n",
    "In this part, we define a GLUE metirc and use it to generate an evaluate function for INC.\n",
    "\n",
    "Refer to doc [metric.md](https://github.com/intel/neural-compressor/blob/master/docs/source/metric.md#build-custom-metric-with-python-api) for how to build your own metric.\n",
    "Refer to doc [dataset.md](https://github.com/intel/neural-compressor/blob/master/docs/source/dataset.md#user-specific-dataset) and [dataloader.md](https://github.com/intel/neural-compressor/blob/master/docs/source/dataloader.md#build-custom-dataloader-with-python-apiapi) for how to build your own dataset and dataloader."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "eval_dataset = raw_datasets[\"validation\"]\n",
    "metric = load_metric(\"glue\", task_name)\n",
    "data_collator = None\n",
    "\n",
    "def compute_metrics(p: EvalPrediction):\n",
    "    preds = p.predictions[0] if isinstance(p.predictions, tuple) else p.predictions\n",
    "    preds = np.argmax(preds, axis=1)\n",
    "    result = metric.compute(predictions=preds, references=p.label_ids)\n",
    "    if len(result) > 1:\n",
    "        result[\"combined_score\"] = np.mean(list(result.values())).item()\n",
    "    return result\n",
    "\n",
    "# Initialize our Trainer\n",
    "trainer = Trainer(\n",
    "    model=model,\n",
    "    train_dataset=None,\n",
    "    eval_dataset=eval_dataset,\n",
    "    compute_metrics=compute_metrics,\n",
    "    tokenizer=tokenizer,\n",
    "    data_collator=data_collator,\n",
    ")\n",
    "\n",
    "eval_dataloader = trainer.get_eval_dataloader()\n",
    "\n",
    "# for transformers 4.31.0: accelerate dataloader\n",
    "# please use the code below to avoid error \n",
    "if eval_dataloader.batch_size is None:\n",
    "    def _build_inc_dataloader(dataloader):\n",
    "        class INCDataLoader:\n",
    "            __iter__ = dataloader.__iter__\n",
    "            def __init__(self) -> None:\n",
    "                self.dataloader = dataloader\n",
    "                self.batch_size = dataloader.total_batch_size\n",
    "        return INCDataLoader()\n",
    "    eval_dataloader = _build_inc_dataloader(eval_dataloader)\n",
    "batch_size = eval_dataloader.batch_size\n",
    "\n",
    "def take_eval_steps(model, trainer, save_metrics=False):\n",
    "    trainer.model = model\n",
    "    metrics = trainer.evaluate()\n",
    "    bert_task_acc_keys = ['eval_f1', 'eval_accuracy', 'eval_matthews_correlation',\n",
    "                            'eval_pearson', 'eval_mcc', 'eval_spearmanr']\n",
    "    for key in bert_task_acc_keys:\n",
    "        if key in metrics.keys():\n",
    "            throughput = metrics.get(\"eval_samples_per_second\")\n",
    "            print('Batch size = %d' % batch_size)\n",
    "            print(\"Finally Eval {} Accuracy: {}\".format(key, metrics[key]))\n",
    "            print(\"Latency: %.3f ms\" % (1000 / throughput))\n",
    "            print(\"Throughput: {} samples/sec\".format(throughput))\n",
    "            return metrics[key]\n",
    "    assert False, \"No metric returned, Please check inference metric!\"\n",
    "\n",
    "def eval_func(model):\n",
    "    return take_eval_steps(model, trainer)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.2 Run Quantization\n",
    "\n",
    "So far, we can finally start to quantize the model. \n",
    "\n",
    "To start, we need to set the configuration for post-training quantization using `PostTrainingQuantConfig` class. Once the configuration is set, we can proceed to the next step by calling the `quantization.fit()` function. This function performs the quantization process on the model and will return the best quantized model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from neural_compressor.quantization import fit\n",
    "from neural_compressor.config import PostTrainingQuantConfig, TuningCriterion\n",
    "tuning_criterion = TuningCriterion(max_trials=600)\n",
    "conf = PostTrainingQuantConfig(approach=\"static\", tuning_criterion=tuning_criterion)\n",
    "q_model = fit(model, conf=conf, calib_dataloader=eval_dataloader, eval_func=eval_func)\n",
    "q_model.save(\"./saved_results\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. Benchmark with Intel® Neural Compressor\n",
    "\n",
    "INC provides a benchmark feature to measure the model performance with the objective settings."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# fp32 benchmark\n",
    "!{sys.executable} benchmark.py 2>&1|tee fp32_benchmark.log\n",
    "\n",
    "# int8 benchmark\n",
    "!{sys.executable} benchmark.py --input_model saved_results 2>&1|tee int8_benchmark.log\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "name": "python",
   "version": "3.9.12"
  },
  "orig_nbformat": 4
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
